[None][fix] Fix ValueError and missing decoding statistics for MTP by cascade812 · Pull Request #12063 · NVIDIA/TensorRT-LLM

cascade812 · 2026-03-10T04:38:52Z

Description

This PR addressed below two issues for MTP:

there's a ValueError: No free slots when enable use_relaxed_acceptance_for_thinking or use_sa_spec for MTP with a max_batch_size > 1.
the speculative decoding statistics is missing in the log for MTP.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

Bug Fixes
- Fixed draft length calculation logic in benchmarking reports to use fallback values when primary metrics are unavailable.
Refactor
- Optimized resource allocation for speculative decoding operations to improve efficiency.

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>

coderabbitai · 2026-03-10T04:44:12Z

📝 Walkthrough

Walkthrough

Two localized adjustments: (1) increased MTP slot pool size by one to accommodate a CUDA graph dummy request; (2) enhanced draft length calculation with fallback logic from num_nextn_predict_layers when max_draft_len is absent.

Changes

Cohort / File(s)	Summary
MTP Slot Pool Expansion `tensorrt_llm/_torch/speculative/mtp.py`	Added extra slot in MTPHiddenStatesManager for CUDA graph padding dummy request, expanding slot_pool_size from max_num_requests to max_num_requests + 1 and adjusting related tensor allocations.
Draft Length Fallback Logic `tensorrt_llm/bench/dataclasses/reporting.py`	Modified get_max_draft_len to apply fallback strategy when speculative_config is a dict: prefer max_draft_len but fall back to num_nextn_predict_layers if absent.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description explains the two issues fixed but lacks critical details: no test coverage information provided, no specific tests listed to safeguard changes, and incomplete PR checklist.	Add specific test cases that verify the ValueError fix and decoding statistics are now included in logs. Clearly list which tests were run or added.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main changes: fixing ValueError for MTP and restoring missing decoding statistics, which aligns with the PR objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/bench/dataclasses/reporting.py`:
- Around line 781-784: The code treats falsy 0 as "missing" and only applies the
new dict-backed fallback, causing object-based configs to skip decoding stats;
update the logic around spec_config to (1) detect if spec_config is a dict and,
if so, use "if 'max_draft_len' in spec_config: return
spec_config['max_draft_len']" (allow 0), else if "num_nextn_predict_layers" in
spec_config return that value (again allowing 0), and (2) for non-dict (object)
configs, use attribute checks like "hasattr(spec_config, 'max_draft_len') and
spec_config.max_draft_len is not None" to return the explicit attribute
(including 0) or fall back similarly to num_nextn_predict_layers; in short, stop
using boolean "or" chains and use explicit key/attribute existence and "is not
None" checks for spec_config.max_draft_len and
spec_config.num_nextn_predict_layers.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c3e187bc-6eb6-4266-854c-3fc44fdc9119

📥 Commits

Reviewing files that changed from the base of the PR and between 3139ffa and 137a2ba.

📒 Files selected for processing (2)

tensorrt_llm/_torch/speculative/mtp.py
tensorrt_llm/bench/dataclasses/reporting.py

coderabbitai · 2026-03-10T04:44:15Z

tensorrt_llm/bench/dataclasses/reporting.py

+                draft_len = (spec_config.get("max_draft_len")
+                             or spec_config.get("num_nextn_predict_layers"))
+                return draft_len or 0
            return spec_config.max_draft_len or 0


⚠️ Potential issue | 🟠 Major

Handle object configs and explicit zero values separately.

The new fallback only applies to dict-backed configs. Line 784 still reads spec_config.max_draft_len only, so object-based MTP configs can keep returning 0 and skip decoding stats. The or chain also treats an explicit max_draft_len=0 as “missing”, which changes the meaning of a configured value.

Proposed fix

if ("speculative_config" in self.kwargs and self.kwargs["speculative_config"] is not None): spec_config = self.kwargs["speculative_config"] # Handle both dict (from YAML) and object types if isinstance(spec_config, dict): - draft_len = (spec_config.get("max_draft_len") - or spec_config.get("num_nextn_predict_layers")) - return draft_len or 0 - return spec_config.max_draft_len or 0 + if "max_draft_len" in spec_config: + return spec_config["max_draft_len"] or 0 + return spec_config.get("num_nextn_predict_layers") or 0 + draft_len = getattr(spec_config, "max_draft_len", None) + if draft_len is not None: + return draft_len + return getattr(spec_config, "num_nextn_predict_layers", 0) or 0

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/bench/dataclasses/reporting.py` around lines 781 - 784, The code treats falsy 0 as "missing" and only applies the new dict-backed fallback, causing object-based configs to skip decoding stats; update the logic around spec_config to (1) detect if spec_config is a dict and, if so, use "if 'max_draft_len' in spec_config: return spec_config['max_draft_len']" (allow 0), else if "num_nextn_predict_layers" in spec_config return that value (again allowing 0), and (2) for non-dict (object) configs, use attribute checks like "hasattr(spec_config, 'max_draft_len') and spec_config.max_draft_len is not None" to return the explicit attribute (including 0) or fall back similarly to num_nextn_predict_layers; in short, stop using boolean "or" chains and use explicit key/attribute existence and "is not None" checks for spec_config.max_draft_len and spec_config.num_nextn_predict_layers.

zheyuf

Thanks. Looks good to me.

cascade812 · 2026-03-10T06:04:04Z

/bot run

tensorrt-cicd · 2026-03-10T06:10:27Z

PR_Github #38389 [ run ] triggered by Bot. Commit: 137a2ba Link to invocation

tensorrt-cicd · 2026-03-10T08:44:52Z

PR_Github #38389 [ run ] completed with state SUCCESS. Commit: 137a2ba
/LLM/main/L0_MergeRequest_PR pipeline #29756 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

fix mtp for batch size > 1 and fix missing decoding statistics

137a2ba

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>

cascade812 requested review from a team as code owners March 10, 2026 04:38

cascade812 requested review from nv-yilinf and zheyuf March 10, 2026 04:38

github-actions bot assigned cascade812 Mar 10, 2026

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

zheyuf approved these changes Mar 10, 2026

View reviewed changes

cascade812 changed the title ~~[None][Fix] fix ValueError and missing decoding statistics for MTP~~ [None][fix] Fix ValueError and missing decoding statistics for MTP Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] Fix ValueError and missing decoding statistics for MTP#12063

[None][fix] Fix ValueError and missing decoding statistics for MTP#12063
cascade812 wants to merge 1 commit intoNVIDIA:mainfrom
cascade812:guiju/at2

cascade812 commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 10, 2026

Uh oh!

zheyuf left a comment

Uh oh!

cascade812 commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cascade812 commented Mar 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

zheyuf left a comment

Choose a reason for hiding this comment

Uh oh!

cascade812 commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cascade812 commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading